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(54) Device and method for reducing delay jitter in data transmission 



(57) A delay unit 103 adds holding time that has 
been set by a holding time setting unit 1 04 to a received 
data. The holding time is computed based on delay time 
of received data and the minimum delay time of data 
received up to a certain point for the purpose of reducing 



a total delay time. The delay time is estimated in a delay 
time estimating unit 106 from the difference between a 
reception time of a packet counted based on an internal 
clock generator 107 and a time designated by a time 
stamp in the received packet. 
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Description 

BACKGROUND OF THE INVENTION 
Field of Invention 

[0001] This invention relates to a delay jitter reducing device for sequentially receiving a series of chronological data 
segments through a transmission path such as the Internet and delaying an individual data segment for an appropriate 
amount of time, thereby reducing delay jitter that has occurred in the propagation process of an individual data segment 
and obtaining chronological data segments from which effects of the delay jitter have been eliminated; and a delay 
jitter reducing method thereof. 

Description of the Related Art 

[0002] One fomi of data transmission is a real-time transmission that transmits a chronological sample of continuous 
signals such as, for example, voice signals after loading them to a plurality of consecutive packets. In such a real-time 
transmission, if delay time in transmitting a packet for individual packets are equal to one another, it is possible to 
obtain a voice signal of the same wavefonn as the source node by reproducing a chronological sample in a packet at 
the time of receiving each packet. 

[0003] In a network such as the Internet, however, even In a case where a plurality of packet are transmitted from 
an unchanged source node to an unchanged destination node, the propagation delay time for individual packets are 
not necessarily the same as one another, and the propagation delay time varies among packets. This variation of the 
propagation delay time among packets is generally called delay jitter 

[0004] In a case where such delay jitter occurs, when a chronological sample is reproduced from received packets 
at the point of receiving each packet at the destination node, it is not assured that a signal of the same waveform as 
the original transmission signal can be reproduced from the received packets. 

[0005] In such a case, destination nodes usually take a step of reducing delay jitter using buffers so as to obtain 
chronological data with effects of delay jitter eliminated. 

[0006] This technique for reducing delay jitter will be described in detail with reference to Fig. 12 to Fig. 1 7. 
[0007] Fig. 1 2 is a block diagram showing a configuration example of a real-time voice transmission system. In the 
system, at a source temriinal 1 0, a voice signal to be transmitted is encoded by a voice encoder 11 , and chronological 
voice packets on which coded data of the voice signal are loaded are generated. A transmission unit 1 2 transmits these 
Individual vok:e packets to a destination terminal 30. Each voice packet arrives at the destination terminal 30 after 
passing a network 20. At the destination terminal 30, voice packets from the source temninal 10 are received by a 
receiving unit 31 and reserved in a buffer 32. Subsequently, voice packets reserved in the buffer 32 are read from the 
buffer 32 in the same order as an order generated at the source node and transmitted to a voice decoder 33. The voice 
decoder 33 receives voice packets transmitted in this way and decodes the voice signal from coded data included in 
the voice packets. 

[0008] In the real-time voice transmission system, each voice packet generated in the source terminal 1 0 is sent out 
to the network 20 at the same transmission time interval as the generated time inten/al of each packet. However, as 
described already, propagation delay time required for these individual packets to reach the receiving terminal 30 is 
not fixed for each voice packet. Such being the case, the destination terminal 30 adjusts the timing for sending individual 
voice packets to the vorce decoder 33. Fig. 17 shows an example of this thning adjustment. In the example shown in 
Fig. 17, voice packets PO, P1 , and P2 arrive at the destination terminal 30, having taken a propagation delay time of 
do, d1 , and d2 each. As shown, if each voice packet PO, PI , and P2 can be delayed for DO, D1 , and D2 which is an 
appropriate amount of time for each, a total delay time T in tum can be fixed, where the total delay time is the amount 
of time required for each voice packet transmitted from the source temriinal 1 0 to the voice decoder 33. The buffer 32 
as shown in Fig. 12 is a device used for adjusting delays in order to fix the total delay time of each voice packet in this 
way. Assuming a minimum delay time of a voice packet as dmin and maximum delay time of a voice packet as dmax 
in the network 20, the difference between them, D=dmax-dmin, is referred as delay jitter width as a matter of conven- 
ience. The buffer 32 in Fig. 12 is required to adjust a variation of delay time in the range of this delay jitter width; in 
other words, the buffer 32 should be capable of reducing the delay jitter. 

[0009] Hereinafter described will be on delay adjustment of a voice packet by the buffer 32 with reference to Figs. 
13Aand13B. 

[0010] In Fig. 13B, there are provided four queues placed above and below in parallel, each queue consisting of a 
chain of nine boxes in a row. The first queue indicates a state of the buffer 32 at a certain time t1 . The second queue 
indicates a state of the buffer 32 at time t2 that is Is later than time t1 . Likewise, the third and fourth queue each 
indicates a state of the buffer 32 at time t3 that is 1 s later than time t2 and at time t4 that is Is later than time t3. 
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[0011] In the example shown in Fig. 13B, the buffer 32 has a capacity of storing nine voice packets. Each of the nine 
boxes in each queue is an area for storing a voice packet, and the notation, #1 to #9, in each box indicates the address 
of each area. 

[0012] In the destination tenninal 30, one voice packet is read every Is from the buffer 32 and sent to the voice 

s decoder 33, where "s" is a unit such as several milliseconds and several dozen milliseconds depending on a data 
attribute, the unit being suitable for each data attribute. The address of an area where a voice packet is read is also 
updated one address every fixed time 1s. In Fig. 13B, an area where a voice packet is currently being read is shown 
at the right end of each queue, an area on the left next thereto is where the readout is performed 1s later, and an area 
on the second left next thereto is where the readout is performed 2s later. Likewise, the other areas follow; thus, the 

10 area at the leftmost of the queue is an area where a voice packet is read 8s later. 

[0013] In the example shown in Fig. 13B, a voice packet is read from the area of address #1 at time t1. At time t2, 
another voice packet is read from the area of address #2, another voice packet is read from the area of address #3 at 
time t3, and another packet from the area of address #4 at time t4. Therefore, if a voice packet received at time t1 is 
written into the area of address #4, the voice packet is output from the buffer 32 to the voice decoder 33 at time t4 

IS which is 3s later. Also, if a voice packet received at time t1 is written into the area of address #9, the voice packet is 
output from the buffer 32 to the voice decoder 33 8s later. In this way, controlling a write address into which a received 
voice packet is written enables delaying the voice packet for an arbitrary amount in the range of Os to 8s. 
[0014] Therefore, if it is possible to delay a voice packet for an amount of time followed by subtracting an absolute 
amount of delay time from maximum delay time to be reduced (dmax shown in Fig. 17) provided that we can obtain 

20 an absolute amount of delay time since each voice packet was transmitted by the source terminal 1 0 till it reaches the 
destination tenninal 30, it would be possible to minimize as well as to fix the total delay time for each voice packet 
transmitted from the source temiinal 30 to the voice decoder 33. 

[0015] However, the destination terminal 30 Is not capable of finding how much propagation delay time it has taken 
for each voice packet to reach the destination. As a consequence, a conventional delay control for each packet is 
^ perfomied in the following rhethod. For simplicity, we assume here that a series of votee packets transmitted from the 
source terminal 1 0 at a certain time interval reaches the destination terminal 30 in the same order as the transmission 
order. 

[0016] First of all, the destination terminal 30, upon receiving a first voice packet through the networi< 20, writes the 
voice packet into an initial input location of the buffer 32{S1 , S2 of Fig. 13A). In the example shown in Fig. 138, the 
30 initial input location is an area corresponding to an address whose assigned number is one larger than an area where 
a voice packet is read at the point of receiving the first voice packet. 

[0017] Then, a voice packet on and after the second packet is written in an area where the readout is performed at 
the earliest timing among areas that are vacant at the point of receiving the subject voice packet (S3 of Fig. 13A). 
[0018] In the example shown in Fig. 1 3B, the first voice packet PI received at time t1 is written in the area of address 

35 #2, which is the initial input location. Then, no voice packet is received at time t2, and the voice packet PI is read from 
the area of address #2 and sent to the voice decoder 33. When it tums time t3, a second voice packet P2 is received. 
It appears to have taken delay time that is 1 s longer than the voice packet PI for the voice packet P2 to be transmitted. 
Then, the voice packet P2 is written in an area where the readout is performed at the earliest timing among vacant 
areas at the receiving time t3, that is, the area of address #3. Subsequently, at time t3, the voice packet P2 Is read 

^ Immediately after being written and is supplied to the voice decoder 33. 

[001 9] -- Thus, even if the voice packet PI and P2 are transmitted from the source terminal 10 at Is time interval 
between them, the difference of 1s in propagation delay time between the two voice packets causes the arrival at the 
destination terminal 30 at the time interval of 2s. However, even in such a case, detemninlng an initial input location of 
the buffer 32 and applying deference by the buffer 32 as described above enables supplying the voice packet PI and 
P2 to the voice decoder 33 at the same time interval as the transmission interval of the source temiinal 10. In other 
words, it is possible to reduce delay jitter as large as Is by allotting an initial input location for a first voice packet to 
an area which will be output later than the read area as of the receiving by an area equivalent to Is. 
[0020] Looking at a group of serial voice packets transmitted from the source temiinal 1 0 to the destination terminal 
30, their propagation delay time vary from the minimum value dmin to the maximum value dmax as shown for example 

so in Fig. 1 7. In a conventional art, when a first voice packet PI is received at the destination terminal 30, an initial input 
location is allotted to an area corresponding to an address that will be output later than the readout address as of the 
receiving by the number of areas equivalent to the delay jitter width D==dmax-dmin, and the voice packet PI is written 
therein: Deciding the Initial input location in this way enables the complete elimination of pre-assumed delay jitters. 
[0021] More, detailed description will be given hereinafter with reference to Fig. 14A, 14B, 14C, 15 and 16. In the 

55 following description, it Is assumed that the delay jitter width is 4s. Also, for the sake of simplicity, we will assume a 
case where the minimum delay time dmin is Os and the delay jitter width of the network 20 is equal to the maximum 
delay time dmax, 

[0022] 1n Fig.14A, the voice packets P11 and P12 are packets output consecutively from the voice encoder 11 of the 
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source terminal 1 0. Likewise, the voice packets P21 and P22 are packets output consecutively from the voice encoder 
11 of the source tenninal 10. Fig. 14B illustrates each voice packet that has reached the receiving unit 31 of the des- 
tination tenninal 30. In the example shown, the voice packets P11 and PI 2 reach the receiving unit 33, both being 
delayed the maximum delay time dmax=:4s. On the other hand, the voice packets P21 and P22 reach the receiving 
5 unit 31 , the former being delayed the minimum delay time dmin=Os and the laner being delayed the maximum delay 
time dmax=4s. Fig. 1 4C then illustrates each of the voice packets being supplied to the voice decoder 33 after deference 
being applied. 

[0023] Fig. 1 5 shows how deference is performed to the packets P1 1 and PI 2 by the buffer 32, and Fig. 1 6 shows 
how deference is performed to the packets P21 and P22 by the buffer 32. 

10 [0024] As shown in Fig. 15, the voice packet P11 that has reached the receiving unit 31 at time t5 is written in the 
area of address #5, which is the initial input location^ thereby being delayed for delay time of 4s and output from the 
buffer 32 to the voice decoder 33 at time t9. Then, the voice packet PI 2 that has reached the receiving unit 31 at time 
t6 is written in the area of address #6, an area where a readout will be performed at the earliest timing among vacant 
areas as of the receiving, thereby being output from the buffer 32 at time t1 0 that is the next timing of the output time 

15 for the voice packet P1 1 . 

[0025] On the other hand, deference such as follows is perfomned for the voice packet P21 and P22. First of all, as 
shown in Fig. 1 6, the voice packet P21 that has reached the receiving unit 31 at time t1 is written in the area of address 
#5. which is the initial input location, thereby being delayed for delay time of 4s and output from the buffer 32 at time 
t5. Then, the voice packet P22 that has reached. the receiving unit 31 at time t6 Is written in an area where a readout 

20 will be perfomned at the earliest timing among vacant areas as of the receiving, thereby being output immediately from 
the buffer 32. 

[0026] As described so far, if an initial input location is set to an area of address which will be output later than the 
read address as of the receiving by the number of areas equivalent to the delay jitter width D=dmax-dmin, It becomes 
possible to reduce every delay jitter in the range of the minimum value dmin and the maximum value dmax. 

^ [0027] However, in the conventional art described above, that a first voice packet received by the destination tenninal 
10 is delayed for delay time which is equivalent to the delay jitter width D means that the same amount of delay time 
will be applied for the succeeding voice packets. If it is assumed that delay time required for the first voice packet to 
pass a networi< is dO here, the total delay time T will be D+dO, the total delay time T designating the amount of time 
required for each voice packet to reach the voice encoder 33 of the destination terminal 30 since the point of being 

30 output from the voice encoder 11 of the source temninal 10. However, the delay time of the first voice packet varies 
from the minimum value dmin to the maximum value dmax, which in turn makes the total delay time T depended on 
the delay time dO of the first voice packet. That means that, in the case of the delay time dO of the first voice packet 
being the minimum delay time dmin, the total delay time T can be made short. However, in a case where the delay 
time of the first voice packet is as long as the maximum delay time dmax, the total delay time T results in a long period 

35 of time two times the maximum delay time dmax. In recent years, the prevalence of such as an Internet telephony 
using VoIP (Voice over IP) technique has caused a call for high-quality communication, which requires the shortening 
in the total delay time. Thus, it is unfavorablethat the total delay time T becomes long for the sake of reducing delay jitter. 

Summary of the Invention 

40 

[0028] This invention is made for solving.the above-mentioned problem and aims at providing a delay jitter reducing 
device capable of shortening the total delay time and a delay jitter reducing method thereof. 

[0029] In order to solve the abovennentioned problem, this invention provides a delay jitter reducing device, com- 
prising: a receiving unit sequentially receiving chronological data segments through a network; a time detecting unit 

^ for obtaining a reception time of each data segment received by said receiving unit; transmission time estimating means 
for estimating transmission time of each data segment received by said receiving unit; a delay time estimating unit for 
estimating a delay time required for transmitting each data segment based on said reception time and said transmission 
time of each data segment; a minimum delay time estimating unit for estimating a minimum delay time in transmitting 
a data segment through the network from the estimated values of delay time of a plurality of data segments obtained 

50 from said delay time estimating unit; relative delay time computing means for obtaining a relative delay time by sub- 
tracting said minimum delay time from the estimated value of delay time of a data segment estimated by said delay 
time estimating unit; and delay means for obtaining an amount of holding time corresponding to each data segment 
by subtracting the relative delay time of each data segment from a maximum delay time to be reduced, and outpiittlng 
each data segment after delaying each data segment for the amount of holding time corresponding to each data seg- 

55 ment. 

[0030] Such a delay jitter reducing device enables the estimation of a minimum value of delay time required for 
transmitting data segments such as packets, thereby detemiining holding time of deference for reducing delay jitter 
based on the minimum value. As a result, delay jitter of a group of received data segments is reduced as well as the 
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total delay time thereof is shortened. 

[0031] The embodiments of the present invention include an embodiment such as of producing and selling a device 
which reduces delay jitter as disclosed in the above-mentioned embodiments as well as an embodiment of distributing 
through a telecommunication line a program for making a network-connected computer function as a delay jitter re- 
5 ducing device as disclosed In the above embodiments and an embodiment of distributing such a program recorded in 
a computer-readable recording medium. 

Brief Description of the Drawings 

10 [0032] 

Fig. 1 is a block diagram showing an overall configuration of a real-time voice transmission system with respect 
- to a first embodiment of the present invention. 

Fig. 2 is a block diagram showing a configuration of a destination terminal in the embodiment. 
^5 Fig. 3 is a block diagram showing a configuration of a delay unit in the embodiment. 

Fig. 4 is a block diagram showing a configuration of a voice packet in the embodiment. 

Fig. 5 is a time chart illustrating an operation of the destination terminal in the embodiment. 
' Fig. 6 is a block diagram showing a configuration of a destination terminal in a second embodiment of the present 
-invention. 

20 Fig. 7 is a diagram showing a packet notifying the start of a non-voice section. 

Fig. 8 is a time chart showing an operation of the embodiment. 
Fig. 9 is a flow chart illustrating an operation of the embodiment. 

Fig. 10A, 1 0B, and 10C are time charts illustrating an operational example of the embodiment. 

Fig. 11 A and 1 1 B illustrate an effect of the embodiment. 
25 Fig. 12 Is a block diagram showing a configuration example of a real-time voice transmission system. 

Fig. 13A is a flow chart illustrating an operation of the system. 

Fig. 1 SB is a time chart illustrating the operation of the system. 

Fig. 14A, 14B, and 14C are time charts illustrating an example of the system. 

Fig. 15 is an operational example of the system. 
30 Fig. 1 6 Is an operational example of the system. . 

Fig. 17 is an operational example of the system. 

Detailed Description 

35 [0033] An embodiment of the present Invention will be described hereinafter with reference to the drawings. 
A. First Embodiment 

[0034] Fig. 1 is a block diagram showing a configuration of a real-time voice transmission system that Is a first 
^ embodiment of the present invention. In the real-time voice transmission system, there are provided a source terminal 
1 0 with a voice encoder 1 1 and a transmission unit 1 2 as in the conventional art. The source temiinal 1 0 and a destination 
terminal 100 are both VoIP terminals. This real-time voice transmission system is for providing an Intemet telephone 
service to a user. 

[0035] Fig. 2 is a block diagram showing a configuration of the destination terminal 100. In this figure^ a receiving 
^ unit 1 01 is a device which receives voice packets from the source terminal 1 0 through the Intemet 20, A packet termi- 
nating unit 1 02 is a device that terminates a protocol of the Intemet 20. A voice packet received by the receiving unit 
1 01 is transmitted through the packet terminating unit 1 02 to a time stamp detecting unit 1 08 and a delay time estimating 
unit 1 06. Also, the packet temninatlng unit 1 02 fetches coded voice data from the payload section of the received voice 
packet and supplies the data to a delay unit 1 03. 
50 [0036] An Internal clock generator 1 07 generates an internal clock of a certain frequency and supplies the generated 
clock to the delay time estimating unit 1 06 and a delay unit 1 03. 

[0037] The delay unit 1 03 is supplied with data of holding time from a holding time setting unit 1 04. The description 
will be given later on how to generate the data of holding time. The delay unit 103 Is a device that supplies a voice 
decoder 110 after holding coded voice data that have been supplied from the packet terminating unit 1 02. The delay 
55 unit 103, as shown for example in Fig. 3, comprises a RAM 103A, a write circuit 1038 for writing coded voice data 
supplied from the packet terminating unit 1 02 into the RAM 1 03A, and a read circuit 1 03C for reading out coded voice 
data from the RAM. The read circuit 103C counts an internal clock supplied from the internal clock generator 107, 
supplies the counted value to the RAM 103A as a read address, reads out coded voice data from an area in the RAM 



5 



3NSDOCID: <EP 1 1 43671 A2J_> 



EP1 143 671 A2 

103A corresponding to the read address, and outputs the data to the voice decoder 110. When coded voice data of a 
voice packet is output from the packet terminating unit 102, the write circuit 1 03B obtains a write address based on a 
read address that is output from the read circuit 103C as of that point and data of holding time that is output from the 
holding time setting unit 104. Then, the write address is supplied to the RAM 103A, and the coded data of the voice 
s packet is written Into an area corresponding to the write address of the RAM 1 03A. The coded voice data written in 
the RAM 103A, when time corresponding to the data of holding time elapses at a later time, are read from the ROM 
103A and output to the voice decoder 110. 

[0038] The voice decoder 1 1 0 is a device which decodes voice data from coded data that are output from the delay 
unit 103. 

10 [0039] The time stamp detecting unit 108, the delay time estimating unit 106, a minimum delay time estimating unit 
105, and the holding time setting unit 104 cooperate to fonn a means for generating data of holding time. 
[0040] As described already, the time stamp detecting circuit 108 is supplied with voice packets received by the 
receiving unit 101. The source temninal 10 (Fig. 1), where the voice packets are originated, contains a counter that 
counts a clock of predetemnined frequency and outputs time data designating a current time and reads the time data 

IS from the counter at the point of generating a voice packet, so that the time data is included in the header of the voice 
packet as a time stamp. Fig. 4 is an example of voice packets with such a time stamp in the header. The time stamp 
detecting circuit 108 fetches the time stamp from the received voice packet and send it to the delay time estimating 
unit 106. 

[0041] The intemal clock generator 1 07 outputs an internal clock of the same frequency as that of the clock used in 
20 the source terminal 1 0. The delay time estimating unit 1 06 counts an internal clock that is output from the internal clock 
generator 1 07 and generates time data designating a current time. This time data almost coincides with the time data 
generated in the source terminal 10, but there is no assurance of the complete coincidence. However, both time data 
units are generated by counting a clock whose frequency is identical to each other. Therefore, the difference in time 
between both time data units is fixed. The delay time calculating circuit 106, when a time stamp of a voice packet is 
25 supplied from the time stamp detecting circuit 1 08 , obtains an estimated value of delay time required for the transmission 
of a voice packet by subtracting the time stamp from the time data of the receiving time of the voice packet. 
[0042] The minimum delay time estimating unit 1 05 is a device for estimating a minimum delay time required for the 
transmission of a voice packet. The minimum delay time estimating unit 1 05 sequentially receives from the delay time 
estimating unit 1 06 estimated values of delay time of voice packets that have been received in sequence by the receiving 
30 unit 101 , Every time the minimum delay time estimating unit 105 receives an estimated value, it selects the smallest 
value among estimated values of delay time up to that point and regards the selected value as an estimated value of 
the minimum delay time. 

[0043] The holding time setting unit 104 is a device which, every time a voice packet PI (1=0,1,2, ...) is received, 
computes data of holding time da corresponding to the voice packet Pi from the equation below: 

35 

da=dmin+D-di (1) 

where di is delay time of a voice packet Pi estimated by the delay time estimating unit 106, dmin is a minimum delay 
40 time of all the voice packets up to the voice packet Pi, and D is a pre-set maximum delay time. 

[0044] The data of holding time da is used in computing a write address for writing a coded voice data unit of a voice 
packet into the RAM 103A, as described above. 

[0045] Fig. 5 is a diagram showing an operation of the present embodiment. An operation of the present embodiment 
will be described with reference to the figure. 
45 [0046] In the destination terminal 1 00, when a first voice packet PO is received, the delay time estimating unit 106 
will calculate an estimated value of delay time according to the following equation from reception time cO and time tO 
designated by the time stamp fetched from the voice packet PO: 

so dO=cO-tO (2) 

from which, in the example shown, delay time of the first voice packet PO is found out to be 7s. 
[0047] Then, the minimum delay estimating unit 1 05 regards the d0=7s as an initial estimated value of the minimum 
delay time dmin. 

55 [0048] Subsequently, the holding time setting unit 104 obtains data of holding time da con^esponding to the voice 
packet PO as follows: 
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da =dmin+D-dO 
=7s+1 2s- 7s 

5 

=12s (3) 

where D is set 12s in this example. 
[0049] The data of holding time da obtained by the holding time setting unit 104 is sent to the delay unit 103. The 
10 delay unit 103 delays the coded voice data of the voice packet PC for an amount of time equivalent to the data of 
holding time da to supply the coded data to the voice decoder 110. 

[0050] When a subsequent voice packet Pi is received at a later time, the delay time estimating unit 1 06 calculates 
an estimated value of delay time according to the following equation from reception time data ci and time ti designated 
by a time stamp fetched from the voice packet Pi. 

15 

di=ci-ti (4) 

[0051 ] Then, the minimum delay time estimating unit 1 05 compares the di against an estimated value of the minimum 
20 delay time dmin as of that point, and maintains the current estimated value dmin of the minimum delay time when it is 
found di^dmin; when it is found di<dmini, dmin is replaced with a value of di. 

[0052] The holding time setting unit 1 04 computes data of holding time da corresponding to the voice packet Pi from 
the aforementioned equation (1). Then, the delay unit 103 delays the coded voice data of the voice packet Pi for an 
amount of time equivalent to data of holding time da to supply the coded data to the voice decoder 110. 

25 [0053] The above operation is performed for all the voice packets. 

[0054] In the beginning of a session, an estimated value of the minimum delay time dmin is updated relatively often. 
However, the more voice packets are received and the more times the minimum delay time is estimated, the closer 
the estimated value of the minimum delay time dmin becomes to a true value of the minimum delay time. Therefore, 
as a time interval for updating the estimated value of the minimum delay time dmin becomes longer, the estimated 

30 value of the minimum delay time dmin becomes stabilized. In the example shown, an estimated value of the minimum 
delay time dmin changes in a way such as becoming 7s at the point of receiving the voice packet PO, 6s at the point 
of receiving the voice packet P2, 4s at the point of receiving the voice packet P6, and 3s at the point of receiving the 
voice packet PI 2. 

[0055] Total delay time T since a voice packet was output from the voice encoder 1 1 of the source terminal 1 0 until 
35 coded voice data thereof are output to the voice decoder 110 of the destination terminal 110 is obtained from the 
following equation: 



40 



T =di+da 
=di+dmin+D-di 

=dmin+D (5) 

45 [0056] As shown for example in Fig. 5. as more voice packet are received, the estimated value of the minimum delay 
time dmin gradually converges into a small value. As a result, the total delay time T also gradually converges into a 
small value. 

[0057] Since total delay time T depends on an estimated value of the minimum delay time, the total delay time T 
changes relatively often In the beginning of a session. However, the more voice packets are received, the longer a 
50 time interval for updating the total delay time T becomes, and the value of total delay time T finally reaches a minimum 
value. 

B. Second Embodiment 

55 [0058] Fig. 6 is a block diagram showing a configuration of a destination terminal 100 with respect to a second 
. embodiment of the present invention. The destination terminal 100 in this embodiment further contains a non-voice 
section detecting unit 109 in addition to the components of the destination temiinal 100 for the first embodiment. The 
non-voice section detecting unit 109 monitors the payload of voice packets received in sequence and detects non- 
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voice sections. To describe further in detail, a source terminal 1 0 in the present embodiment, when a user of the terminal 
10 stops vocalization and a non-voice section in which there is no voice to be transmitted begins, transmits to the 
destination terminal 100 a voice packet which includes information designating the start of the non-voice section in the 
payload as shown in Fig. 7. The non-voice section detecting unit 109 of the destination terminal 100, by receiving this 
s voice paclcet, detects the start of a non-voice section. When the destination terminal 100 receives a voice packet 
including some kind of coded voice data in the payload at a later time, the non-voice section detecting unit 1 09 detects 
the end of the non-voice section. 

[0059] Subsequently, a holding time setting unit 104 in the present embodiment, when the end of the non-voice 
section is detected by the non-voice section detection unit 109, computes data of holding time da from an estimated 
10 value of delay time for a first voice packet of a voice section obtained from a delay time estimating unit 1 06, an estimated 
value of the minimum delay time obtained from a minimum delay time estimating unit 105 at that point, and a known 
delay jitter width, the result being output to a delay unit 1 03. The computing of data of holding time and the supplying 
of the data to the delay unit 1 03 are performed every time non-voice section begins. 

[0060] Fig. 8 is a time chart showing an operation of the destination terminal 1 00 with respect to the present embod- 

IS Iment, and Rg. 9 is a flow chart showing an operation of the destination terminal 100 with respect to the present 
embodiment. The operation of the present embodiment will be described hereinafter with reference to these figures. 
[0061] When a phone-to-phone conversation between the source terminal 10 and the destination temriinal 100 is 
Initiated, a voice section and non-voice section are repeated alternately as shown in Fig. 8, the voice section being a 
period where voice packets representing the voice of a caller are received by the destination temiinal 1 00 and the non- 

2t> voice section being a period where no voice packets are received. 

[0062] As in the first embodiment, every time a voice packet is received by the receiving unit 101 , the delay time 
estimating unit 106 obtains an estimated value of delay time forthe voice packet (step S101 and S102). 
[0063] In a first voice section SPO, the minimum delay time estimating unit 1 05 considers an estimated value of delay 
time for a first voice packet PO to be an estimated value of the minimum delay time dmin (step SI 03 and SI 04). As 

25 for each of the received voice packets in the first voice section SPO, data of holding time da is computed from the 
aforementioned equation (1 ), and a result thereof is set to the delay unit 1 03 (step SI 05). In the delay unit 1 03, a write 
address is found out from the data of holding time da and a read address of a RAM 1 03A as of that point. Then, coded 
voice data of a voice packet is written into an area of the RAM 103A corresponding to the write address. The coded 
voice data, after time has elapsed by an amount of time equivalent to the data of holding time da, are read from the 

3o RAM 103A and supplied to a voice decoder 110 (step 81 06). 

[0064] Then, when a voice packet as illustrated in Fig. 7 is received by the receiving unit 101 , the non-voice section 
detecting unit 109 detects the start of a non-voice section NPO. Instead of transmitting a packet for notifying the start 
of a non-voice section from the source terminal 10 to the destination terminal 100 in such a way, it is also possible to 
detect the start of a non-voice period when a voice packet is not received over a certain period at the destination 

35 terminal 100. 

[0065] We assume that the voice section SPO changes to the non-voice section NPO and that a subsequent voice 
section SP1 begins at a later time. When a first voice packet PO of the voice section SP1 is received by the receiving 
unit 101 , the delay time estimating unit 106 finds out an estimated value of delay time dO of the voice packet PO (SI 01 
and SI 02 in Fig. 6). 

^ [0066] Subsequently, the minimum delay time estimating unit 1 05 estimates a minimum delay time dmin from among 
estimated values of delay time for all the voice packets that have been received up to that point (step S104). In the 
present embodiment, an estimated value of the minimum delay time can be updated only when a first voice packet of 
a vok:e section is received. In other words, once a voice section begins, the estimated value of the minimum delay 
time is not updated even if a value of delay time is estimated to be smaller than that of the minimum delay time at the 

^ beginning. It is when the voice section ends to turn to a non-voice section and another voice section begins that the 
update can be made. 

[0067] At the point of receiving the first voice packet PO of the voice section SP1 , the holding time setting unit 1 04 
obtains the estimated value of the minimum delay time dmin from the minimum delay time estimating unit 105 (step 
SP104). 

so [0068] Subsequently, the holding time setting unit 104 computes data of holding time da from the aforementioned 
equation (1), and supplies a result to the delay unit 103 (step S105). 

[0069] In the delay unit 103, a write address Is found from the data of holding time da and a read address of the 
RAM 103A of that point. Then, coded voice data of the voice packet PO are written in an area of the RAM 103 A 
corresponding to the write address(step SI 06). 
55 [0070] In the voice section SP1 as in the voice section SPO, an estimated value of delay time di is cakDulated as to 
a voice packet Pi received by the receiving unit 1 01 (step S102). The estimated value of delay time di obtained in the 
voice section SP1 is used for estimating a minimum delay time when a voice sectbn SP2 is started at a later time (step 
S103 and 8104). 
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[0071] The operation of the present embodiment will be described further in detail with concrete examples shown. 
[0072] Fig. 1 0A shows voice packets that are output in sequence from the voice encoder 11 of the source terminal 
10. Fig. 1 0B shows voice packets that are received in sequence by the receiving unit 1 01 of the destination tenminal 
100. Fig. IOC shows voice packets that are output in sequence to the voice decoder 110. As shown in Fig. 10B. voice 
packets PO, P1 , P2, and P3 serially output from the voice encoder 1 1 reach the receiving unit 1 01 , each having delayed 
d0(~3s). d1(-4s), d2(=2s). and d3(=2s). During this period, an estimated value of delay time di output by the delay 
lime estimating unit 106 and an estimated value of maximum delay time dmin in the minimum delay time estimating 
unit 105 will change as follows: 



received packet Pi 


estimated delay time di 


estimated minimum delay time dmin 


PO 


3s 


3s 


P2 


2s 


2s 


P1 


4s 


2s 



[0073] Because an estimated value of the minimum delay time dmin is not available in the first voice section SPO, 
the addition of the network delay jitter width D and 1s is used as data of holding time da. Therefore, supposing that 
thedelay jitter width D is 3s, the data of holding time will be 4s. Given that d0=3s in the example shown, the total delay 
time of serial voice packets PO to P2 turns out to be d0+da=3s-i-4s=7s. 

[0074] To the contrary, in the next voice section SP1 , the minimum delay time dmin (=2s) is obtained from estimated 
values of delay time obtained up to that point, and based on the dmin (=2s) the holding time will be detennined. 
[0075] Hence, supposing that voice packet P3 is transmitted with delay d3=1 s in the voice section SP1 as shown in 
Fig. 10A and 10B, the data of holding time will be as follows: 



25 



da = (dmin+D-d3) 



=2s43s-1s 



30 



=4s. 



[0076] The total delay amount for each voice packet of the voice section SP1 starting from voice packet P3, in turn, 
becomes d3+da=1s+4s=5s. 

[0077] Fig. 1 1 A and 118 show an effect of the present embodiment. Supposing that delay time for a first voice, packet 
Is do in a first voice section SPO, the total delay time dO+D of each voice packet for voice section SPO will be dO+D. 
[0078] When a voice packet is received with the minimum delay time dmin=3 in voice section SPO, and in a subse- 
quent voice section SP1 the holding time determined based on this minimum delay time is applied.. As a result, the 
total delay time will be d3+D. 

[0079] In a conclusion, the present embodiment enables the reduction of the total delay time by deciding the amount 
of holding time based on a minimum delay time estimated based on estimated delay time of received packets. Also, 
updating the minimum delay time at the point of receiving a first voice packet after a non-voice section keeps voice 
quality from deteriorating. For these reasons, the delay jitter reducing device and reducing method are well suited for 
an application requires real-ttmeliness and high voice quality such as the Internet telephony. 

45 C: Modifications 



35 



40 



50 



55 



[0080] The present invention is not limited to the above-described embodiments, but various modifications such as 
are exemplified below are possible. 

(1) In the above-described embodiments, the present invention is applied to a device that receives data segments 
such as packets through the Intemet. However, the present invention may be applied to a device that receives 
.data segments through a wide-area network such as, for example, a frame relay, not being limited to the Internet. 
The present invention can also be applied to a device that receives data segments through a network where delay 
jitter Is produced In a wireless section as in the mobile networic, 

(2) In the above-described embodiments, a packet is shown as an example of a data segment. However, a form 
of data segment is not limited to a packet. Data segments may be anything that includes transmission time or any 
clue infomiation for finding the transmission time. Data segments may be in any unit such as frames and cells 
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depending on a transmission path or a protocol to be used. Protocols may be the VoIP such as is described above 
or such things as the Voice over Frame Relay (VoFR). 

(3) In the above-described embodiment, the present invention is applied for a device that receives voice packets 
through a network. However, the present Invention is well-suited for transmission of not only voice but video and 
information requiring real-time transmission. 

(4) In the above-described second embodiment, the present invention is applied to a real-time voice transmission 
in which a voice section and a non-voice section is alternately repeated, where in the voice section voice packets 
are consecutively transmitted and in the non-voice section the transmission of voice packets is not performed for 
a consecutive period of time. In this embodiment, the holding time of a voice section is decided based on an 
estimated value of the minimum delay time acquired in a first previous voice section. However, the application of 
the present invention is not limited thereto. For example, another fomi of data transmission is that a first section 
and a second section repeat by turns, where in the first section information requiring continuity such as motion 
pictures are transmitted and in a second section infonnation not requiring continuity such as still pictures are 
transmitted. The present invention can be applied to such a fomi of data transmission. In this application, the 
following procedure for reducing the delay jitter will be perfomried at the destination device: 

I) during a period of receiving data segments including infomnation of a second section not requiring continuity, 
delay time of each data segment and a minimum delay time are estimated; 

ii) when receiving a first data segment of a first section right after the second section, delay time of the first 
data segment is estimated; and 

iii) based on the above estimated value of the minimum delay time and the estimated value of delay time for 
the first data segment acquired in the above ii), data of holding time for the first data segment is computed. 
The computing method is same as what has been described in each of the above embodiments. 

(5) In the above-described second embodiment, no packets are transmitted in a non-voice section, but it is aiso 
possible to keep transmitting data that designates it being a non-voice section. 

(6) In each of the above-described emt>odiments, the delay jitter width is a fixed value acquired by measuring the 
value in advance. However, when the delay jitter width turns out to be bigger than the initially supposed amount, 
it is possible to update the delay jitter width D to be used for computing data of holding time so that such a large 
delay jitter can be reduced. In the above second embodiment, for example, we assume that It follows from equation 
(1) that the data of holding time is -3s, the result being computed based on an estimated value of delay time dO 
of a first packet in a voice section SPk, an estimated value of the minimum delay time acquired in the previous 
voice section SPk-1 , and the delay jitter width D. This is because an actual delay jitter width is at least 3s larger 
than the initially supposed delay jitter width D. Therefore, the delay jitter width D is to be incremented 3s, so that 
the data of holding time becomes Os. This renewed delay Jitter width D is used for computing data of holding time 
from equation (1 ) in the subsequent voice section SPk-i-1 . 

(7) A device for reducing delay jitter with respect to the present invention can be provided with a relay device of a 
network or a router, for example. This modification is for the sake of reducing delay jitter in the middle of a trans- 
mission path because a long transmission path leads to a long delay jitter width. 

(8) The minimum delay time may be estimated in a certain limited period. To illustrate, the following example can 
be conceived. First, in the beginning of a session, before initiating the voice packet transmission, a training packet 
including a time stamp is repeatedly transmitted from a source terminal to a destination terminal. At the destination 
terminal, a minimum delay time dmin is estimated from estimated values of delay time for these individual training 
packets. Data of holding time da applied to a subsequent voice packet is obtained from the aforementioned equation 
(1) using the dmin. 

(9) In the above-mentioned embodiment, a transmission time of a packet is estimated from a time stamp. However, 
in a case where a time stamp is not included in a packet, it is possible to estimate the transmission time from such 
things as serial numbers included in a packet. 

(10) The embodiments of the present invention include an embodiment such as of producing and selling a device 
which reduces delay jitter as disclosed in the above-mentioned embodiments as well as an embodiment of dis- 
tributing through a telecommunication line a program for making a network-connected computer function as a 
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delay jitter reducing device as disclosed in the above embodiments and an embodiment of distributing such a 
program recorded in a computer-readable recording medium. 

5 Claims 

1 . A delay jitter reducing device, comprising: 

a receiving unit sequentially receiving chronological data segments through a network; 
a time detecting unit for obtaining a reception time of each data segment received by said receiving unit; 
transmission time estimating means for estimating transmission time of each data segment received by said 
receiving unit; 

■ a delay time estimating unit for estimating a delay time required for transmitting each data segment based on 
said reception time and said transmission time of each data segment; 

a minimum delay time estimating unit for estimating a minimum delay time In transmitting a data segment 
through the network from the estimated values of delay time of a plurality of data segments obtained from said 
delay time estimating unit; 

relative delay time computing means for obtaining a relative delay time by subtracting said minimum delay 

■ time from the estimated value of delay time of a data segment estimated by said delay time estimating unit; and 
20 delay means for obtaining an amount of holding time corresponding to each data segment by subtracting the 

relative delay time of each data segment from a maximum delay time to be reduced, and outputting each data 
segment after delaying each data segment for the amount of holding time corresponding to each data segment. 

2. A delay jitter reducing device according to claim 1 , 

25 

wherein said receiving unit receives a plurality of training data segments before receiving a data segment to 
which deference is to be applied; and 

wherein said minimum delay estimating unit estimates said minimum delay time from estimated values of 
delay time for said plurality of training data segments. 

30 

3. A delay jitter reducing device according to claim 1 , 

wherein said minimum delay time estimating unit obtains estimated values of delay time for a plurality of data 
segments that are received in a certain period and estimates said minimum delay time from these estimated values. 

35 4. A delay jitter reducing device according to claim 3, 

wherein said data segment is a data unit representing voice. 

5. A delay jitter reducing device according to claim 1 , 

^ wherein said receiving unit alternately receives a data segment belonging to a first section that requires con- 

tinuity and a data segment belonging to a second section that does not require continuity; and 
' wherein said minimum delay time estimating unit estimates, at the point of receiving a first data segment 
belonging to the first section, said minimum delay time for data segments that have been received up to the 
time point. 

45 

6. A delay jitter reducing device according to claim 1 , 

wherein said delay time estimating unit estimates delay time of said data segment based on transmission 
time infomiation or any clue information for transmission time accompanied by said data segment and reception 
time thereof. 

50 

7. A delay jitter reducing method, comprising: 

a receiving process sequentially receiving chronologbal data segments through a network; 
a time detecting process for obtaining a reception time of each data segment received by a receiving unit; 
a transmission time estimating process for estimating transmission time of each data segment received by 
said receiving unit; 

a delay time estimating process for estimating delay time required for transmitting each data segment based 
on said reception time and said transmission time of each data segment; 
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a minimum delay time estimating process for estimating a minimum delay time in transmitting a data segment 
through the network from the estimated values of delay time of a plurality of data segments obtained from said 
delay time estimating unit; 

a relative delay time computing process for obtaining a relative delay time by subtracting said minimum delay 
time from the estimated value of delay time of a data segment estimated by said delay time estimating unit; and 
a delay process for obtaining an amount of holding time corresponding to each data segment by subtracting 
the relative delay time of each data segment from a maximum delay time to be reduced, and outputting each 
data segment after delaying each data segment for the amount of holding time corresponding to each data 
segment. 

A program for making a network-connected computer execute: 

a receiving process sequentially receiving chronological data segments through the network; 

a time detecting process for obtaining a reception time of each data segment received by a receiving unit; 

a transmission time estimating process for estimating transmission time of each data segment received by 

said receiving unit; 

a delay time estimating process for estimating delay time required for transmitting each data segment based 
on said reception time and said transmission time of each data segment; 

a minimum delay time estimating process for estimating a minimum delay time in transmitting a data segment 
through the network from the estimated values of delay time of a plurality of data segments obtained from a 
delay time estimating unit; 

a relative delay time computing process for obtaining a relative delay time by subtracting said minimum delay 
timefrom the estimated value of delay time of a data segment estimated by said delay time estimating unit; and 
a delay process for obtaining an amount of holding time corresponding to each data segment by subtracting 
the relative delay time of each data segment from a maximum delay time to be reduced, and outputting each 
data segment after delaying each data segment for the amount of holding time corresponding to each data 
segment. 

A computer-readable recording medium that has recorded a program for making a network-connected computer 
execute: 

a receiving process sequentially receiving chronological data segments through the network; 
a time detecting process for obtaining a reception time of each data segment received by a receiving unit; - 
a transmission time estimating process for estimating transmission time of each data segment received by 
said receiving unit; 

a delay time estimating process for estimating delay time required for transmitting each data segment based 
on said reception time and said transmission time of each data segment; 

a minimum delay time estimating process for estimating a minimum delay time In transmitting a data segment 
through the network from the estimated values of delay time of a plurality of data segments obtained from a 
delay time estimating unit; 

a relative delay time computing process for obtaining a relative delay time by subtracting said minimum delay 
time from the estimated value of delay time of a data segment estimated by said delay time estimating unit; and 
a delay process for obtaining an amount of holding time corresponding to each data segment by subtracting 
the relative delay time of each data segment from a maximum delay time to be reduced, and outputting each 
data segment after delaying each data segment for the amount of holding time corresponding to each data 
segment. 
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(54) Device and method for reducing delay jitter in data transmission 



(57) A delay unit 103 adds holding time that has 
been set by a holding time setting unit 1 04 to a received 
data. The holding time is computed based on delay time 
of received data and the minimum delay time of data 
received up to a certain point for the purpose of reducing 



a total delay time. The delay time is estimated in a delay 
time estimating unit 106 from the difference between a 
reception time of a packet counted based on an internal 
clock generator 107 and a time designated by a time 
stamp in the received packet. 
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